Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The assumption across nearly all language model (LM) tokenization schemes is that tokens should be subwords, i.e., contained within word boundaries. While providing a seemingly reasonable inductive bias, is this common practice limiting the potential of modern LMs? Whitespace is not a reliable delimiter of meaning, as evidenced by multi-word expressions (e.g., "by the way"), crosslingual variation in the number of words needed to express a concept (e.g., "spacesuit helmet" in German is "raumanzughelm"), and languages that do not use whitespace at all (e.g., Chinese). To explore the potential of tokenization beyond subwords, we introduce a "superword" tokenizer, SuperBPE, which incorporates a simple pretokenization curriculum into the byte-pair encoding (BPE) algorithm to first learn subwords, then superwords that bridge whitespace. This brings dramatic improvements in encoding efficiency: when fixing the vocabulary size to 200k, SuperBPE encodes a fixed piece of text with up to 33% fewer tokens than BPE on average. In experiments, we pretrain 8B transformer LMs from scratch while fixing the model size, vocabulary size, and train compute, varying *only* the algorithm for learning the vocabulary. Our model trained with SuperBPE achieves an average +4.0% absolute improvement over the BPE baseline across 30 downstream tasks (including +8.2% on MMLU), while simultaneously requiring 27% less compute at inference time. In analysis, we find that SuperBPE results in segmentations of text that are more uniform in per-token difficulty. Qualitatively, this may be because SuperBPE tokens often capture common multi-word expressions that function semantically as a single unit. SuperBPE is a straightforward, local modification to tokenization that improves both encoding efficiency and downstream performance, yielding better language models overall.more » « less
-
We study the problem of finding a (pure) product state with optimal fidelity to an unknown n-qubit quantum state ρ, given copies of ρ. This is a basic instance of a fundamental question in quantum learning: is it possible to efficiently learn a simple approximation to an arbitrary state? We give an algorithm which finds a product state with fidelity ε-close to optimal, using N=npoly(1/ε) copies of ρ and poly(N) classical overhead. We further show that estimating the optimal fidelity is NP-hard for error ε=1/poly(n), showing that the error dependence cannot be significantly improved. For our algorithm, we build a carefully-defined cover over candidate product states, qubit by qubit, and then demonstrate that extending the cover can be reduced to approximate constrained polynomial optimization. For our proof of hardness, we give a formal reduction from polynomial optimization to finding the closest product state. Together, these results demonstrate a fundamental connection between these two seemingly unrelated questions. Building on our general approach, we also develop more efficient algorithms in three simpler settings: when the optimal fidelity exceeds 5/6; when we restrict ourselves to a discrete class of product states; and when we are allowed to output a matrix product state.more » « less
-
We have examined the origins of polytype selection during metal-mediated molecular-beam epitaxy of GaN nanowires (NWs). High-angle annular dark-field scanning transmission electron microscopy reveals [111]-oriented zinc blende (ZB) NWs and [0001]-oriented wurtzite (WZ) NWs, with SixNy at the interface between individual NWs and the Si (001) substrate. Quantitative energy dispersive x-ray spectroscopy reveals a notably higher Si concentration of 7.0% ± 2.3% in zinc blende (ZB) NWs than 2.3% ± 1.2% in wurtzite (WZ) NWs. Meanwhile, density functional theory calculations show that incorporation of 8 at. % Si on the Ga sublattice inverts the difference in formation energies between WZ and ZB GaN, such that the ZB polytype of GaN is stabilized. This identification of Si and other ZB polytype stabilizers will enable the development of polytype heterostructures in a wide variety of WZ-preferring compounds.more » « less
-
Planar magnetic microswimmers are well-suited for in vivo biomedical applications due to their cost-effective mass production through standard photolithography techniques. The precise control of their motion in diverse environments is a critical aspect of their application. This study demonstrates the control of these swimmers individually and as a swarm, exploring navigation through channels and showcasing their functional capabilities for future biomedical settings. We also introduce the capability of microswimmers for surface motion, complementing their traditional fluid-based propulsion and extending their functionality. Our research reveals that microswimmers with varying magnetization directions exhibit unique trajectory patterns, enabling complex swarm tasks. This study further delves into the behavior of these microswimmers in intricate environments, assessing their adaptability and potential for advanced applications. The findings suggest that these microswimmers could be pivotal in areas such as targeted drug delivery and precision medical procedures, marking significant progress in the biomedical and micro-robotic fields and offering new insights into their control and behavior in diverse environments.more » « less
-
We consider the question of Gaussian mean testing, a fundamental task in high-dimensional distribution testing and signal processing, subject to adversarial corruptions of the samples. We focus on the relative power of different adversaries, and show that, in contrast to the common wisdom in robust statistics, there exists a strict separation between adaptive adversaries (strong contamination) and oblivious ones (weak contamination) for this task. Specifically, we resolve both the information-theoretic and computational landscapes for robust mean testing. In the exponential-time setting, we establish the tight sample complexity of testing N(0,I) against N(αv,I), where ∥v∥2=1, with an ε-fraction of adversarial corruptions, to be Θ~(max(d√α2,dε3α4,min(d2/3ε2/3α8/3,dεα2))) while the complexity against adaptive adversaries is Θ~(max(d√α2,dε2α4)) which is strictly worse for a large range of vanishing ε,α. To the best of our knowledge, ours is the first separation in sample complexity between the strong and weak contamination models. In the polynomial-time setting, we close a gap in the literature by providing a polynomial-time algorithm against adaptive adversaries achieving the above sample complexity Θ~(max(d−−√/α2,dε2/α4)), and a low-degree lower bound (which complements an existing reduction from planted clique) suggesting that all efficient algorithms require this many samples, even in the oblivious-adversary setting.more » « less
-
Abstract Meteoroids of sub‐milligram sizes burn up high in the Earth's atmosphere and cause streaks of plasma trails detectable by meteor radars. The altitude at which these trails, or meteors, form depends on a number of factors including atmospheric density and the astronomical source populations from which these meteoroids originate. A previous study has shown that the altitude of these meteors is affected by long‐term linear trends and the 11‐year solar cycle related to changes in our atmosphere. In this work, we examine how shorter diurnal and seasonal variations in the altitude distribution of meteors are dependent on the geographical location at which the measurements are performed. We use meteoroid altitude data from 18 independent meteor radar stations at a broad range of latitudes and investigate whether there are local time (LT) and seasonal variations in the altitude of the peak meteor height, defined as the majority detection altitude of all meteors within a certain period, which differ from those expected purely from the variation in the visibility of their astronomical source. We find a consistent LT and seasonal response for the Northern Hemisphere locations regardless of latitude. However, the Southern Hemisphere locations exhibit much greater LT and seasonal variation. In particular, we find a complex response in the four stations located within the Southern Andes region, which indicates that the strong dynamical atmospheric activity, such as the gravity waves prevalent here, disrupts, and masks the seasonality and dependence on the astronomical sources.more » « less
-
NA (Ed.)Abstract A very high‐spatial resolution (∼21–23 m pixel at 85 km altitude) OH airglow imager at the Andes Lidar Observatory at Cerro Pachón, Chile observed considerable ducted wave activity on the night of 29–30 October 2016. This instrument was collocated with a Na wind‐temperature lidar that provided data revealing the occurrence of strong ducts. A large field of view OH and greenline airglow imager showed waves present over a vertical extent consistent with the altitudes of the ducting features identified in the lidar profiles. While waves that appeared to be ducted were seen in all imagers throughout the observation interval, the wave train seen in the OH images at earlier times had a distinct leading nonsinusoidal phase followed by several, lower‐amplitude, more sinusoidal phases, suggesting a likely bore. The leading phase exhibited significant dissipation via small‐scale secondary instabilities suggesting vortex rings that progressed rapidly to smaller scales and turbulence (the latter not fully resolved) thereafter. The motions of these small‐scale features were consistent with their location in the duct at or below ∼83–84 km. Bore dissipation caused a momentum flux divergence and a local acceleration of the mean flow within the duct along the direction of the initial bore propagation. A number of these features are consistent with mesospheric bores observed or modeled in previous studies.more » « less
-
As evidence grows supporting the importance of non-cognitive factors in learning, computer-assisted learning platforms increasingly incorporate non-academic interventions to influence student learning and learning related-behaviors. Non-cognitive interventions often attempt to influence students’ mindset, motivation, or metacognitive reflection to impact learning behaviors and outcomes. In the current paper, we analyze data from five experiments, involving seven treatment conditions embedded in mastery-based learning activities hosted on a computer-assisted learning platform focused on middle school mathematics. Each treatment condition embodied a specific non-cognitive theoretical perspective. Over seven school years, 20,472 students participated in the experiments. We estimated the effects of each treatment condition on students’ response time, hint usage, likelihood of mastering knowledge components, learning efficiency, and post-tests performance. Our analyses reveal a mix of both positive and negative treatment effects on student learning behaviors and performance. Few interventions impacted learning as assessed by the post-tests. These findings highlight the difficulty in positively influencing student learning behaviors and outcomes using non-cognitive interventions.more » « less
An official website of the United States government

Full Text Available